



whoami 


► Director of Engineering @ Etsy 

► Enterprise, Fraud, Security, Email, Fun 

► stringencoders: 

► C library for string processing 

► used by every ad server in the world 

► used in Chrome browser 

► http://code.google.eom/p/stringencoders 
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The Next 14 Minutes 


► Why is detecting SQLi hard 

► The aigorithm behind libinjection 

► The results 

► Next Steps 


Nick Galbreath 



USA auiB 


@ngalbreath 



Detecting SQLi 
from User Input 
is a Hard Problem 





It's Easy to Get Started 
with Regular Expressions! 

s/UNION\s+(ALL)?/i 

► At least two open source WAF use regular 
expressions. 

► Failure cases in ciosed-source WAFs aiso 
indicate regexp. 
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SQL IS HUGE 


► Turing Complete! (sorta) 

► 1992 SQL Spec: bit. ly/10fmhZ 

► 625 pages of plain text 

► 2003 SQL Spec: bit. ly/0B5vfW 

► 128 pages of pure BNF 

► No one implements exactly 

► Everyone has extensions, exceptions, bugs 
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It's more complicated 
than you think. 

► Recursive commenting rules 

► A single number can't be done in a single 
regexp. 

► Really Loosely Typed 

► String rules - OMFG. You think you know but 
you have no idea. 

► Come see my talk at DEFCON this Friday at... 
4:20 pm 
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RegExp Soup 

(?:\)\s*when\s*\d+\s*then)|(?:"\s*(?:#|—|{))|(?:\/\*!\s?\d+)|(?:ch(?:a)?r\s*\(\s*\d)|(?:(?:(n?and|x?or|not)\s+|\|\||\&\&)\s*\w+\() 

(?:[\s()]case\s*\()|(?:\)\s*like\s*\() j(?:having\s*[^\s]+\s*[^\w\s])|(?:if\s?\([\d\w]\s*[=<>~]) 

(?:"\s*or\s*"?\d)|(?:\\x(?:23|27|3d))I(?:-.?'■$)|(?:(?:-["\\]*(?:[\d"]+|[^"]+"))+\s*(?:n?and|x?or|not|\|\||\&\&)\s*[\w"[+&!(§(),.-])|(?:[^\w\s]\w+\s*[|-] 
\s*"\s*\w) I (?:(a\w+\s+(and |or)\s* ["\d] + ) | (?:(a[\w-]+\s(and |or)\s*[''\w\s]) | (?: [''\w\s: ]\s*\d\W+[''\w\s]\s*".) | (?:\Winformation_schema |table_name\W) 

(?: "\s*\*. + (?:or|id)\W*"\d)|(?:\''")|(?:''[\w\s"-] + (?<=and\s)(?<=or\s)(?<=xor\s)(?<=nand\s)(?<=not\s)(?<=\i\|)(?<=\&\&)\w+\()|(?:"[\s\d]*[''\w\s]+\W*\d 
\W*.*["\d]) I (?:"\s*[''\w\s?]+\s*[''\w\s]+\s*") | (?:"\s*[''\w\s]+\s*[\W\d] .*(?:#|--)) | (?: ".*\*\s*\d) | (?:"\s*or\s [''\d] +[\w-] + .*\d) | (?: [()*<>%+-] [\w-] + [''\w\s] 
+'■[-,]) 

(?:\d"\s+"\s+\d)I(?:''admin\s*"|(\/\*)+"+\s?(?:--|#|\/\*|{)?)|(?:"\s*or[\w\s-]+\s*[+<>=(),-]\s*[\d"])|(?:"\s*[''\w\s]?=\s*")|(?:"\W*[+=]+\W*")|(?:"\s* [ != | ] 
[\d\s!=+-]+.*["(].*$)I(?:"\s*[!=|][\d\s! = ]+.*\d+$)|(?:"\s*like\W+[\w"(])|(?:\sis\s*0\W)|(?:where\s[\s\w\. ,-]+\s=)|(?:"[<>~]+") 

(?:union\s*(?:allIdistinct|[(!(a]*)?\s*[([]*\s*select)|(?:\w+\s+like\s+\")|(?:like\s*"\%)|(?:"\s*like\W*["\d])|(?:"\s*(?:n?and|x?or|not |\|\||\&\&)\s+[\s 
\w]+=\s*\w+\s*having)|(?:"\s*\*\s*\w+\W+")|(?:"\s*[''?\w\s=.,;)(]+\s*[((a"]*\s*\w+\W+\w)|(?:select\s*[\[\]()\s\w\.,"-]+from)|(?:find_in_set\s*\() 

(?:in\s*\(+\s*select)|(?:(?:n?and|x?or|not |\|\||\&\&)\s+[\s\w+]+(?:regexp\s*\(|sounds\s+like\s*"|[=\d]+x))|("\s*\d\s*(?:—|#))|(?:"[%&<>''=]+\d\s*(=| 
or))I(?:"\W+[\w+-]+\s*=\s*\d\W+")j(?:"\s*is\s*\d.+"?\w)|(?:"\|?[\w-]{3,}[''\w\s.,]+")|(?:"\s*is\s*[\d.]+\s*\W.*") 

(?:[\d\W]\s+as\s*["\w]+\s*from)|(?:"'[\W\d]+\s*(?:union|select|create|rename|truncate|load|alter|delete|update|insert|desc))|(?:(?:select|create|rename| 
truncateI load latter I delete I update I insert Idesc)\s+(?:(?:group_)concat jchar|load_file)\s?\(?)|(?:end\s*\);)|("\s+regexp\W)|(?:[\s(]load_file\s*\() 
(?:(a.+=\s*\(\s*select)|(?:\d+\s*or\s*\d+\s*[\-+])|(?:\/\w+;?\s+(?:having|and|or|select)\W)|(?:\d\s+group\s+hy.+\()|(?:(?:;|#|--)\s*(?:drop|alter))|(?: 
(?:;|#| — )\s*(?:update I insert)\s*\w{2,})|(?:[-\w]SET\s*(a\w+)|(?:(?:n?and|x?or|not |\|\||\&\&)[\s(]+\w+[\s)]*[!=+] +[\s\d]*["=()]) 

(?:"\s+and\s*=\W) | (?:\(\s*select\s*\w+\s*\() | (?:\*\/from) | (?:\+\s*\d+\s*\+\s*(a) | (?:\w"\s*(?: [-+= |(a]+\s*) +[\d(]) | (?:coalesce\s*\( |(a(a\w+\s*[^\w\s]) | (?:\W! 
+"\w)I(?:";\s*(?:if|while|begin))|(?:"[\s\d]+=\s*\d)|(?:order\s+by\s+if\w*\s*\()|(?:[\s(]+case\d*\W.+[tw]hen[\s(]) 

(?:(select|;)\s+(?:benchmark|if|sleep)\s*?\(\s*\(?\s*\w+) 

(?:create\s+function\s+\w+\s+returns)|(?:;\s*(?:select|create|rename|truncate|load|alter|delete|update|insert|desc)\s*[\[(]?\w{2,}) 
(?:alter\s*\w+.*character\s+set\s+\w+)|(";\s*waitfor\s+time\s+")|.*:\s*goto) 

(?:procedure\s+analyse\s*\()|(?:;\s*(declare|open)\s+[\w-]+)|(?:create\s+(procedure|function)\s*\w+\s*\(\s*\)\s*-)|(?:declare[''\w]+[(a#]\s*\w+)|(exec\s*\ 
(\s*(a) 

(?:select\s*pg_sleep)|(?:waitfor\s*delay\s?"+\s?\d)|(?:;\s*shutdown\s*(?:;|—|#|\/\*|{)) 

(?:\sexec\s+xp_cmdshell)|(?:"\s*!\s*["\w])|(?:from\W+information_schema\W)|(?:(?:(?:current_)?user|database|schema|connection_id)\s*\([''\)]*)|(?:";? 
\s*(?:select|union|having)\s*[''\s])|(?:\wiif\s*\()|(?:exec\s+master\.)|(?:union select (a)|(?:union[\w(\s]*select)|(?:select.*\w?user\()|(?:into[\s+]+ 
(?:dump|out)file\s*") 

(?:merge.*using\s*\()|(execute\s*immediate\s*")|(?:\W+\d*\s*having\s*[''\s\-])|(?:match\s*[\w(),+-]+\s*against\s*\() 

(?:,[)\da-f"]|\Z|[''"]+))|(?:\Wselect.+\W*from)|((?:select|create|rename|truncate|load|alter|delete|update|insert|desc)\s*\(\s*space\s*\() 

(?:\[\$(?:ne|eqIIte?|gte?|n?in|mod jall|size|exists|type|slice|or)\]) 

(?:(sleep\((\s*)(\d*)(\s*)\)|benchmark\((.*)\,(.*)\))) 

(?:(union(.*)select(.*)from)) 

(?: ''(-00000234561429496729514294967296121474836481214748364710000012345|-2147483648|-21474836491000002345612.2250738585072007e-308|le309)$) 
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Guns and Butter 


► In 2005, right here at Black Hat, Hanson and 
Patterson presented: 

Guns and Butter: Towards Formal Axioms of 
Validation (http://bit.ly/OBe7mJ) 

► ... formally proved that for any regex validator, we 
could construct either a safe query which would be 
flagged as dangerous, or a dangerous query which 
would be flagged as correct. 

► (summary from libdejector documentation) 
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Existing WAFs 


► Visual inspection shows bugs 

► Don't see very much in testing 

► Don't see much or any false positive testing 

► Closed source WAF have zero accountability 
(e.g. there is no formal disclosure of what they 
detect or not, and how they do it) 
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CAN WE DO BETTER? 
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libinjection 





Key Insight 


► A SQLi attack must be parsed as SQL with 
the original query. 

► "Is it a SQLi attack?" becomes 
"Could it be a SQL snippet?".... 
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Only 3 Contexts 


User input is only "injected" into SQL in three 
ways: 

► "As-ls" 

► Inside a singie quoted string 

► Inside a doubie quoted string 

(I suppose another wouid be inside a comment, 
but we can't do everything) 



Nick Galbreath 


USA BUIS 


@ngalbreath 



Identification of SQL 
snippets without context 

is hard 


► 1-917-660-3400 my phone number or an 
arithmetic expression? 

► (angalbreath my twitter account or a SQL 
variable? 


Nick Galbreath 



USA BUIS 


@ngalbreath 



Existing SQL Parsers 


► Only parse their flavor of SQL 

► Not well designed to handie snippets 

► Hard to extend 

► Worried about correctness 

... so I wrote my own! 

O 

black hat 
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Tokenization 


► Converts input into a stream of tokens 

► Uses "master list" of keywords and functions 
across all databases. 

► Handles comments, string, literals, weirdos. 
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select 1 /*!00000AND 2 > 1 ^/ 


[{ 

■k', 

■SELECT'), 

// 

keywo rd 

{ 


■1'), 

// 

number 

{ 

'o', 

■AND'), 

// 

operator 

{ 


■2'), 

// 

number 

{ 

'o', 


// 

operator 

{ 


■1')] 

// 

number 
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Meet the Tokens 


none/name 

► group-like operation 

variable 

► union-like operator 

string 

► logical operator 

regular operator 

► function 

unknown 

► comma 

number 

► semi-colon 

comment 

► left parens 

keyword 

► right parens 
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Merging, 

Specialization, 

Disambiguation 

► "IS", "NOT" ==> "IS NOT" (single op) 

► "NATURAL", "JOIN" => "NATURAL JOIN" 

► "+" operator -> "+", "unary operator" 

► COS, function, 1, number ==> 

COS, not a function... not followed by ( 
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Folding 


► This step actually isn't needed to detect, but 
is needed to reduce false positives. 

► Converts simpie arithmetic expressions into a 
single value (don't try to evaluate them). 

► 1-917-660-3400-> "1" 

o 
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Knows nothing about SQLi 


► So far this is purely a parsing problem. 

► Knows nothing about SQLi (which is evolving) 

► Can be 100% tested against any SQL input 
(not SQLi) for correctness. 
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Fingerprints 


► The token types of a user input form a hash or 
a fingerprint. 

► select 1 /*!00000AND 2>1*/ 

► KlOlOl 

► Now let's generate fingerprints from Real 
World Data. 

► Can we distinguish between SQLi and benign 
input? 

O 
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Training on SQLi 


► Parse known SQLi attacks from 

► SQLi vulnerability scanners 

► Published reports 

► SQLI How-Tos 

► >32,000 total 
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Training on real Input 


► 100s of Millions of user inputs from Etsy's log 
were also parsed. 

► Large enough to get a good sample (Top 50 
USA site) 

► Old enough to have lots of odd ways of 
handling query string, etc 

► Full text search with an diverse subject 
domain 
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How many tokens are 
needed to determine if a 
user input 
is SQLi or not? 






No matter long the input is. 





480 out of 1,048,576 are SQLi 


n,(k{ l))Un n)ok{ s)Unk DolB n,{kl s&n&s k{vv) sosos loks, sk)&l Dolf Dolk n);k& l&lBf Dolo s&os n,{kf s;k; n&lol 
lo(f{ f{k,{ lokl loksc solf( sk)&f soso{ 1),(1 s))&{ s))&l &f{)o nok(l kl,lk l&f(n sokol lUnkl lok{l n))kk lUnk( soko{ 
l)o(l s)okl sov&s n;kn( nok{k s))&f sovso Dokl s))&o l)ok( s&sos n&k(l s&vso sole sUk(k kl,l, l)o{n l)o{k lBf{l lkf(l 
s&kol s&k{o l)kl sk);k l&f{l lUnkf s)kl sos&{ l&(kl l))on s&kc nUk(o l;ko{ 1)B1 soke n)ol& noloo l&{kn s&loo solol 
s)olf l&{kf s)olk s)olo f(l)o n&lo( s)olB lokv, sk)&{ l;kok sokl f(k() lUkv, s&lof l&loo n&lf{ 1))); !)))& l&lof sovol 
s&lov s)&lo sonol lo({f 1)))) lo{s) s)&lf l&lov lUklk n))ok kOok nkkse lUkle n))of s&(kl s)&lB n;kks n)o{k kf{n, f(f{l 
sovov s&lo{ sovos s&lol vokl, sovok sUkl lo(({ l)))k l&lol f{f{) l)))o n))o{ 1)))U klk(k lUkl, l&lf( so{s) 1)))B f(n() 
n))ol s)&{l l)of( l,{kl sk)Bl f(l,f l,{k{ lBk(l lonos lolf( l,f{l IBle s&oke s;ko{ sklos s&oko lonol l,{kf sBl l));k 
s;kf{ n)kks s;kok sklol s;k{( lo{{l lolBf so(f{ n;kf{ s&k{l l&lo{ nof(l s);kk skle l))o{ s);ko s);kn sokl, s;k(l l)kks 
s);kf so(os solov s;kl, l))Uk soknk s))kl l)Blo l)Ble n);k( n;kok s;k(o s);k{ soklo sokle sf(n, s);k& sBl&s s;klo sUnol 
s))kk n);kf l&sol sokn, n;ko( n);kk n);kn n);ko s&lon sof{k n;k&k klo{s sonos skl&l sof(f losol l;kne sUknk fOof n&{l) 
s&ko{ sof{) oklol n,f{l lo{l) s;kkn s;kks lo(kn sof{l sUkn, s)kle l;kn( s)klo s;k&k skks s;n:k no(ol s))o{ k(ok{ so{ks 
so(kk so(kn so(ko s))ol n)&{k olkf{ s))ok ;kkne skkse so(kl n;k{{ s&o(l s))of so{k) n;k(l n&(ol s&kok sov:o s)of( sU{kk 
sU(kn f{v,l sk)of l)&f( sk)ok nolf{ sU{ks oUkl, lokle s&(l) s&kos loklk sUnkl Dono lof(l solo{ s;knn s;knk lofO vUkl, 
nolof l&nol sk)ol s)Bl l)&o( sUkl& s&{k) loDo fO&f sk)o( n&f{l solof l)on& 1)B1& soloo nolol solok lokl, lof{n nolo{ 
solos s;kn{ lof{f sUnkf lo{n) s&los no{kl n)))o n)))k lkk{l l;k{o l){)s s&klo s)Bl& n)&lf n))&{ sUkl, n)&lo nol&l n))); 
sf(l) l;k{l n)))& sokf( l;k{( ookl, n)of{ sUkle s)Ble n&(kl sUklo s)Blo lUkf{ okkkn s&vos s)o{k D&lf lUkl l))&o l))&f 
D&IB l)&{k s,l), f{lol s)&f{ s)o{l sUkf{ s&k&s lokf{ !)&{! !))&! l;kf{ !))&{ sokos l))ok lolof lo{lo Ikkse loloo lUk{k 
l))of lolov Ukkkn l,(f( lok{k solUk s&lf{ sokok of(l) l;k&k kf{l) sk)kl s&v:o sok&s n)olo n)olf sUn{k lolo( lolol l))ol 
sov&l n));k n))&f sk)kk s)&{k l)Unk n))&l sU((k l)klo l);kk s;kve l);ko l);kn l)kle s;kvk l);kf lUks, s&o{k l);k& s)&o{ 
s&(lo s&f{) 1,1), l);k( sk)Un sk)Uk s&f(l D&lo lUkse nUnk( so{{k lolkf s&lBf l))kk kvk(l n&olo f(l)& &f{l) l))kl so{({ 
s))Un s))Uk n,(f( DUkl s),{l s&knk 1))B1 s)kks lUk no(l) n)&f{ s)ok{ s))Bl sos l&(lo s)Ukl s));k so{l) l&o(l sok(l 
nUk(k n&lof IBl sBle n&loo so{lo Ikle sok(s sok{o sok{k so{{s solkf l;kks s)))B sf(s) l&olo n)klo s)))U sonkl kf{l, 
lo(kf l,s), s)))k sol&l s)))o s&nos s&lUk s&olo lo(kl solBf s;k[k sBlos ofOo s;k[n s)))& s&(f{ sol&s s&nol sol&o s))); 

Possible that more token types will be added to help 
reduce false positives. 
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The Library 

► C, logic is under 1000 LOG 

► No memory allocation 
(caller makes a copy of input) 

► Fixed, stable memory usage 

► No threads 

► 100k query strings can be checked per 
second 

► Could go even faster 



Nick Galbreath 


USA BUIS 


@ngalbreath 



Sample Usage 


sfilter sf; // on stack, ~500 bytes 
const char* ucg = "my user input"; 
bool issqli = is_sqli(&sf, 

ucg, St rlen(ucg)); 


// tada 


metadata on input is in struct sfilter; 
(names subject to change, cleanup) 
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Test Cases 


► All input test cases available 

► Including false positives found aiong the way 

► Code coverage reports 
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Python Prototype 


► Algorithm in python as well 

► Not as up-to-date as the C version 

► Working on it 

► Runs under PyPy (and quite fast) 
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Make Existing Systems 
Work Better 

► The Tokenizer could be ripped out, to make a 
"SQL normalizer/simplifier" 

► all white space normalized 

► all comments removed 

► all numbers in various flavors converted to "1" 

► all strings converted to a fixed value "foo" 

► Makes existing regular expressions work 
better and detect more. 

o 

black hat 

USA BUIS 
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Great for Fuzzers 


► The SQLi fingerprints are actuaily a great 
source of templates for fuzzers and SQLi 
generators 

► Take fingerprint and turn it back into SQL 
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Available Now on GitHub 


► https://github.com/client9/libinjection 

► BSD License 

(only to track how this gets used) 

► Use it. 
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Help! 


► More SQL! test cases! 

► More real-world test cases 

► Missing some PGSQL / Oracle string insanity 

► Need better understanding of non-ASCII 
usage 

► Porting to other languages 
(it's not that hard). 
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More Analysis at DEFCON 20 


New Techniques in SQLi Obfuscation 

SQL never before used in SQLi 
http://slidesha.re/MfOiNR 

July 27, 2012 Friday, 4:20pnn at the Rio 
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https://github.com/client9/libinjection 

http://slidesha.re/0Bch5k 

(angalbreath 

nickg@client9.com 

nickg@etsy.com 


Thanks for coming by! 



